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Wolbachia are maternally transmitted intracellular bacteria that invade insect populations by 
manipulating their reproduction and immunity and thus limiting the spread of numerous 
human pathogens. Experimental Wolbachia infections can reduce Plasmodium numbers in 
Anopheles mosquitoes in the laboratory, however, natural Wolbachia infections in field ano- 
phelines have never been reported. Here we show evidence of Wolbachia infections in 
Anopheles gambiae in Burkina Paso, West Africa. Sequencing of the 16S rRNA gene identified 
Wolbachia sequences in both female and male germlines across two seasons, and determined 
that these sequences are vertically transmitted from mother to offspring. Whole-genome 
sequencing of positive samples suggests that the genetic material identified in An. gambiae 
belongs to a novel Wolbachia strain, related to but distinct from strains infecting other 
arthropods. The evidence of Wolbachia infections in natural Anopheles populations promotes 
further investigations on the possible use of natural Wolbachia-Anopheles associations to limit 
malaria transmission. 
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Malaria- transmitting Anopheles mosquitoes are the dead- 
liest animals on this planet, causing the death of more 
than 600,000 people each year and endangering the 
lives of half of the world's population ^ Current insecticide-based 
control strategies to stop malaria transmission by targeting the 
mosquito vector are limited by the rapid spread of insecticide 
resistance^. In addition, these interventions target only indoor 
feeding and resting populations, with the use of insecticide- 
treated bednets and the application of indoor residual sprays, 
respectively. For decades, the use of Wolbachia endosymbionts 
has been proposed as an alternative to chemical strategies because 
of the ability of these bacteria to rapidly invade insect populations 
via cytoplasmic incompatibility^, and successful Wolbachia 
invasions in field settings have been demonstrated in the case 
of the dengue and yellow fever vector Aedes aegypti^. Recent 
proof that Wolbachia infections of Anopheles vectors limit the 
development of the Plasmodium parasites that cause malaria^"^ 
makes these bacteria a particularly attractive tool for the control 
of both endo- and exophagic populations of malaria-transmitting 
anophelines. Long-standing limitations concerning the 
introduction of Wolbachia into laboratory colonies of Anopheles 
mosquitoes have been recently overcome^; however, the 
usefulness of this system for the control of Anopheles 
populations has been undermined by the apparent absence of 
natural infections. Indeed, while Wolbachia strains have been 
detected in many insects^' attempts to identif)^ these bacteria in 
field Anopheles have failed, promoting the belief that these 
mosquitoes are not natural hosts for Wolbachia^^~^^ . 

In this study, we report evidence of natural Wolbachia 
infections in two incipient species of the major malaria vector. 
Anopheles gambiae. We isolate Wo/^ac/z/a-specific 16S rRNA 
sequences from the reproductive organs and carcasses of adult 
mosquitoes and from larval carcasses in three different villages in 
Burkina Faso, West Africa. Whole-genome shotgun sequencing 
of two positive samples reveals a previously uncharacterized 
Wolbachia strain that is maternally transmitted in laboratory 
settings. These results open new avenues for exploiting Wolbachia 
infections for field applications targeting the malaria mosquito. 

Results 

Wolbachia sequences detected in field An, gambiae popula- 
tions. We collected mating couples from natural An. gambiae 
mating swarms in Burkina Faso, West Africa, to identif)^ the 
microbial populations of the male and female reproductive tracts. 
We analysed two reproductively isolated populations of An. 
gambiae, the M and S molecular forms, which do not interbreed 
in the field^^ and are now classified as two separate species 
{An. coluzzii and An. gambiae, respectively^^). For simplicity, in 
the text and in the tables, we will refer to these two species as M 
and S forms. Our collections included 8 1 couples captured in three 
different villages: two villages in Vallee du Kou (VK5 and VK7), 
where the M form is prevalent, and one village in Soumousso, 
primarily populated with S form (Fig. 1, Supplementary Table 1). 

An initial high-throughput sequencing of the 16S rRNA gene 
(variable region V4, average 23,589, s.d. 16,070 assembled reads 
per sample) amplified from ovaries and testes dissected from 30 
mating couples produced a molecular fingerprint of the bacterial 
population of these reproductive tissues. Surprisingly, this 
analysis identified one sample, derived from the testes of an 
S form male collected in Soumousso, infected with Wolbachia. 
This sample contained 21.8% of reads (5,412 out of 24,800) 
matching the V4 regions of the Wolbachia 16S rRNA gene, with a 
percentage identity ranging between 95.3 and 97.6%. This 
percentage is fully consistent with the overall diversity of known 
Wolbachia V4 sequences (average identity 95.9%, s.d. 1.75%, 



median 96.0% as estimated from available sequences, see 
Methods), while it is incompatible with any other sequenced 
bacterial 16S gene (closest matches at <89% identity). Phyloge- 
netic analysis rooted the Wolbachia sequences into one of the two 
main subtrees of the genus, further confirming their taxonomic 
placement (Fig. 2). 

The identification of Wolbachia sequences in the testes of an 
An. gambiae male prompted us to analyse the remaining 
mosquito specimens collected from the same mating swarms in 
the three villages. To this aim, a more sensitive PCR-based 
amplification and sequencing of a 16S rRNA gene segment 
comprising three variable regions (V6, V7 and V8) was utilized to 
specifically detect Wolbachia in ovaries and testes dissected from 
the remaining 51 mating couples, using previously validated 
primers The mosquito carcasses were also examined using the 
same method. Out of the 102 mosquitoes analysed, 11 were 
positive for Wolbachia 16S rRNA, leading to a frequency of 
infection of 10.8% similarly distributed between males and 
females (5 males and 6 females) (Supplementary Table 1). 
Interestingly, Wolbachia sequences were PGR amplified from 
either the reproductive tissues or from the carcasses, but never 
from both, with the exception of one male, where 16S sequences 
were identified by high -throughput sequencing in the testes and 
by PGR in the carcass. Although sequences corresponding to 
these endosymbionts were more prevalent in M mosquitoes, the 
difference in frequency between the two species was not 
statistically significant (12.8% in M samples compared with 
4.2% in S samples). We also amplified DNA from fourth instar 
larvae collected from the same three villages (100 specimens from 
Soumousso, 80 specimens from VK5 and 70 specimens from 
VK7) using the same primer sets described above. Wolbachia 
sequences were found in five larval samples (three from 
Soumousso, one from VK5 and one from VK7), suggesting 
occurrence of maternal transmission (Supplementary Table 2). 

Wolbachia sequences group into two different clusters. Ana- 
lysis of the amplified regions determined the presence of at least 
two distinct clusters of Wolbachia sequences (Supplementary 
Fig. 1). One cluster was identical to the reference strains wAlbB 
isolated from A. albopictus^^ , while the second was closely related 
to several wPip strains isolated from Culex mosquitoes and 
Drosophila species^^. The wAlbB-like cluster was detected only in 
reproductive tissues (ovaries and testes), while wPip-like 
sequences were more widespread and were found in ovaries, 
carcasses and whole larvae (Supplementary Fig. 1). Although 
these Wolbachia ribosomal sequences are not host specific, the 
identification of two distinct clusters suggests the occurrence of 
independent Wolbachia infections in An. gambiae, as observed in 
other hosts^^. 

We next analysed the distribution of the Wolbachia sequences 
among the different villages. A larger number of positive 
individuals were isolated from mating swarms in VK5, where 7 
out of the 36 collected mosquitoes showed evidence of infection 
(19.4%). The other 4 positive samples were found in VK7 
(3 samples out of 42, 7.1%) and Soumousso (1 sample out of 24, 
4.2%) (Supplementary Table 1). Strikingly, six of the seven positive 
samples isolated from VK5 had been collected in just two of the 
seven mating swarms analysed in that village (Supplementary 
Table 1). If confirmed, such clustering of Wo/^ac/z/a-positive 
individuals in specific swarms would suggest that ecological and 
environmental factors might play a key role in the establishment of 
Wolbachia infections in the An. gambiae host. 

The Wolbachia strain belongs to a new phylogenetic group. To 

expand the 16S rRNA-based analysis and better characterize the 
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Figure 1 | Mosquito collection sites and distribution of Wblbacfi/a-infected individuals. The nnaps describe the three villages (Sounnousso, VK5 and VK7) 
where An. gambiae couples fronn different mating swarms (indicated by circles) and larvae from different breeding sites (indicated by squares) were 
collected. Swarm sites and larval breeding grounds are identified by numbers and letters, respectively (see Supplementary Table 1 and 2). Sites where 
IVo/bach/a-positive mosquitoes were found are highlighted in red. Maps were adapted with permission from originals courtesy of A. A. Millogo, 
IRSS/Centre Muraz, Bobo-Dioulasso, Burl<ina Faso. 



Wolbachia strain found in An. gambiae, we performed whole- 
genome shotgun (WGS) metagenomic sequencing of two of the 
Wo/^ac/z/a-positive samples from mosquito ovaries 
(Supplementary Table 3). Sequences corresponding to the An. 
gambiae genome were screened out, and a pipeline for detecting 
Wo/^ac/z/a-specific sequences based on the unique marker 
approach was applied^ (see Methods). A total of 571 reads 
uniquely attributable to Wolbachia were detected (Fig. 3a,b, 
Supplementary Fig. 3a,b) with 86 ± 3.9 and 87.0 ± 4.4% average 
identity for the two samples analysed, which is in line with the 
sequence divergence observed for Wolbachia strains in different 
supergroups (Supplementary Fig. 2). These sequences matched 
134 Wolbachia genes belonging to different functional categories 
(Fig. 3c, Supplementary Fig. 3c). The majority of reads (32.2% on 
average across the two samples) matched genes from metabolic 
pathways, while a relevant number of reads (13.3%) corresponded 
to Wolbachia-specific transposases; this is in line with the 
observation that transposases abundantly populate the genome of 
Wolbachia strains from many insects including mosquitoes (for 
example, transposases correspond to 8.32% of all genes annotated 
in the wPip genome^^). Alignment of our reads to eight fully or 
partially sequenced Wolbachia reference genomes 
(Supplementary Table 4) demonstrated that the strain identified 
here belongs to a potential Anop/ze/es-specific phylogenetic 
supergroup, distinct from the arthropod- associated and 
evolutionarily related supergroups A and B (Fig. 3d, 
Supplementary Fig. 3D). We henceforth call this strain wAnga. 

wAnga is maternally transmitted. The presence of Wolbachia 
DNA in the reproductive tissues of female adults prompted the 
question of whether these bacteria are inherited from mother to 
offspring, an essential prerequisite for their spread through a 
population. To obtain irrefutable proof of vertical transmission 
and estimate its efficiency, semi-gravid females were collected 
from houses in VK5 two seasons after the initial collections. The 
same sets of Wolbachia 16S fragments were identified at a 



frequency of 21% (19 out of 91 females), and the progenies of the 
5 Wo/^ac/z /a -positive females that laid eggs were then analysed 
for infection. Occurrence of maternal transmission was detected 
in all progenies, with an average transmission frequency of 
68% (ranging from 56 to 100%) (Fig. 4). Taken together, these 
data confirm the presence of Wolbachia sequences over the 
course of 2 years and indicate occurrence of vertical transmission 
from mother to offspring, as normally observed in Wolbachia 
infections. 



Discussion 

The identification of genomic sequences from a novel Wolbachia 
strain in two incipient species of An. gambiae over the course of 
different seasons suggests that anopheline mosquitoes naturally 
harbour these bacteria, prompting renewed efforts to exploit 
Wolbachia to block malaria transmission. Past attempts to 
identif)^ Wolbachia in these mosquitoes may have failed due to 
possible methodological limitations in the detection systems used, 
including non-optimal DNA amplification and extraction 
methods, and size of the sampled mosquito population^ In 
addition, the newly identified wAnga strain appears to be highly 
divergent from Wolbachia strains isolated in other insects. 
Indeed, our attempts to amplif)^ by PGR two Wo/^ac/z /a -specific 
genes commonly used in phylogenetic analyses, the wolbachia 
surface protein wsp and the fructose-biphosphate aldolase fbpA, 
were unsuccessful despite numerous attempts (see Supplementary 
Table 5 for primer sets used), suggesting a low degree of sequence 
conservation. Interestingly, in some Wolbachia strains infecting 
C. pipiens, the wspB gene is disrupted by the insertion of an IS256 
transposon^^ which belongs to the same family of transposons 
identified in wAnga. Similar transposon insertions into wsp may 
have occurred in the Wolbachia strain infecting An. gambiae, 
compromising the amplification of this gene. 

Although examples of horizontal gene transfer (HOT) between 
Wolbachia and insect hosts are widespread^^'^^, two major 
observations strongly argue against the possibility of HGT of 
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Figure 2 | Phylogenetic tree for Wolbachia 16S rRNA sequences. The tree was built using V4 765 rRNA fragments from Wolbachia reference sequences 
available in public repositories (NCBI and SILVA) and from the sequences obtained in this study. The sequences are clustered at 99% identity and the 
cardinality of each OTU obtained is reported in a logarithmic scale as a bar chart external to the tree (green for reference sequences and red for the 
new sequences). Shadings highlight subtrees with high bootstrapping support (>90%) and host organisms are reported for those OTUs in which at 
least one-third of the sequences are consistently associated to the same host. 



Wolbachia sequences into the An. gambiae genome: (1) with one 
exception, evidence of infection was identified in reproductive 
tissues (ovaries and testes) from nine females and males but not 
in the carcasses dissected from the same individuals, or vice versa, 
ruling against a possible transfer into the mosquito chromosomes; 
(2) the average coverage of Wolbachia in our WGS samples was 
lower than 0.05 x , which is incompatible with the three orders of 
magnitude higher coverage of the An. gambiae genome in the 
same samples (>75 x ). Alternatively, HGT may have occurred 
into a bacterium or eukaryotic microorganism that infects the 
Anopheles germline and is maternally inherited from mother to 
progeny (based on the evidence of vertical transmission of 16S 
sequences). Regardless of their origin, the Wolbachia sequences 



identified here may still be sufficient to induce Wolbachia-like 
reproductive phenotypes, such as bidirectional cytoplasmic 
incompatibility, that would impact future field deployments of 
experimental Wolbachia infections'*. 

The unexpected discovery in the mosquito germline of 
maternally inherited Wolbachia organisms will prompt further 
studies of the ecological, environmental and genetic determinants 
of susceptibility of Anopheles mosquitoes to Wolbachia infections. 
It will also spark critical investigation into whether natural 
Wolbachia- Anopheles associations limit the development of 
Plasmodium parasites in the mosquito host, thus aiding the 
design of novel effective bacterial infection strategies to control 
malaria transmission. 
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Figure 3 | Whole-genome shotgun sequencing of a Wolbachia-positive sample identifies a new Wolbachia strain in An. gambiae. Ovaries from a 
Wolbachia-posW'Ne female (sample SI) were sequenced using WGS. (a) Percentage (Perc.) identity of 395 short sequences uniquely attributable to 
Wolbachia from sample SI versus eight sequenced strains. The Wolbachia reference strains and the supergroups are indicated with the corresponding 
percentage of assigned sequences, (b) Distribution of phylogenetic distances between different Wolbachia supergroups, from phylogenies reconstructed 
separately on each of the short sequences universally conserved within Wolbachia genomes (light blue inset in a), (c) Functional classification of Wolbachia 
loci identified by alignment to read sequences, based on NCBI annotated gene functions, (d) Wolbachia phylogeny, comprising the new Wolbachia strain 
wAnga isolated in An. gambiae, reconstructed from the concatenated sequences of b. These analyses show that wAnga is different from all other strains 
sequenced so far. The same analyses for another PCR-positive Wolbachia sample are available in Supplementary Fig. 3. 
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Figure 4 | Vertical transmission of Wolbachia from mother to offspring. 

A total of 14 lA/o/bac/i/a- positive blood-fed An. gambiae females collected 
from houses in VK5 were allowed to lay eggs individually. The progeny of 
the five lA/o/bac/i/a- positive females (W + ) that laid eggs was screened for 
the presence of Wolbachia at the fourth larval developmental stage (L4). 
Red indicates IVo/boc/i/a-positive larvae and adults. The number of larvae 
screened is shown for each female (n). 



Methods 

Mosquito collections. Mosquito samples were initially collected during August- 
September 2011 in three different sites near Bobo-Dioulasso, Burkina Faso. The 



village of Soumousso (11°00'N; 4°02'W) is located 55 km North-East of 
Bobo-Dioulasso. It is characterized by wooded savannah and by temporary 
breeding sites, more favourable to S form (An. gambiae)^'^. The two other collection 
sites are located in Vallee du Kou, a large rice-growing area situated 30 km North- 
west of Bobo-Dioulasso. The village of VK5 (11°23'N; 4°24'W) is completely 
surrounded by rice fields, while VK7 (11°24'N; 04°24'W) is characterized by rice 
fields to the South and by Savannah to the North. Because of the irrigation system, 
rice fields form permanent mosquito breeding sites in which the M form {An. 
coluzzii) thrives Nonetheless, few transient breeding sites could be found in 
depressions and ponds within the villages. Male and female adult mosquitoes were 
collected in copula^'^ from an average of six mating swarms per collection day. 
Fourth instar larvae were also collected from temporary and permanent water pools 
from each site. A schematic representation of the villages and swarm locations is 
provided in Fig. 1. In August 2013, blood-fed An. gambiae females were collected 
inside the houses in VK5 and allowed to individually oviposit in the insectary. 



DNA extraction and species genotyping. Genomic DNA was extracted from 
dissected reproductive tissues (testes and ovaries) using DNeasy kit (Qiagen), and 
from carcasses using NucleoSpin 96 Tissue kit (Macherey-Nagel). In 2013, to 
estimate Wolbachia maternal transmission, DNA was extracted using DNeasy kit 
(Qiagen) from whole females that were allowed to oviposit, and from their pro- 
genies. For M and S genotyping, DNA was extracted from a leg using a fast 
extraction method. In brief, individual legs were incubated in 40 \i\ of grinding 
buffer (10 mM Tris-HCl pH 8.2, 1 mM EDTA, 25 mM NaCl) with 0.2mgml-l 
proteinase K for 45 min at 37 °C, then 5 min at 95 °C to inactivate the enzyme. 
DNA extracts (1 |il) were then subjected to PGR amplification targeting the locus 
S200 X 6.1 using specific primers (FWD: 5'-TGGGGTTAGAGGTTGGGTTA-3'; 
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and REV: 5'-CGCTTCAAGAATTCGAGATAC-30^^. M and S genotyping was 
also used on larvae and adult carcasses. Larval DNA was used to determine the 
sex using Y-specific primers (F: 5'-CAAAACGACAGCAGTTCC-3'; and R: 
5'-TAAACCAAGTCCGTCGCT-30. 

16S rRNA profiling and sequencing. The 16S rRNA gene data set consisted of 
lUumina MiSeq sequences targeting the V4 variable region. Detailed protocols used 
for 16S amplification and sequencing were previously described^^. In brief, 
genomic DNA from testes and ovaries was subjected to 16S rRNA amplifications 
using primers incorporating the Illumina adapters and a sample barcode sequence, 
allowing directional sequencing covering the variable region V4 (15F: 5'-GTGCC 
AGCMGCCGCGGTAA-3'; and 806R: 5'-GGACTACHVGGGTWTCTAAT-3'). 
PGR mixtures contained 10|il of diluted template (1:50), 10|il of HotMasterMix 
with the HotMaster Taq DNA Polymerase (5 Prime) and 5 [i\ of primer mix (2 |iM 
of each primer). The cycling conditions consisted of an initial denaturation at 94 °C 
for 3 min, followed by 30 cycles of denaturation at 94 °C for 45 s, annealing at 50 °C 
for 60 s, extension at 72 °C for 5 min and a final extension at 72 °C for 10 min. 
Amplicons were quantified on the Caliper LabChipGX (PerkinElmer, Waltham, 
MA), pooled in equimolar concentrations, size selected (375-425 bp) on the Pippin 
Prep (Sage Sciences, Beverly, MA) to reduce nonspecific amplification products 
from host DNA and a final library size and quantification was done on an Agilent 
Bioanalyzer 2100 DNA 1000 chips (Agilent Technologies, Santa Clara, CA). 
Sequencing was performed on the Illumina MiSeq v2 platform, according to the 
manufacturer's specifications with addition of 5% PhiX, generating paired-end 
reads of 175 bp in length in each direction. The overlapping paired-end reads were 
stitched together ( ~ 97 bp overlap) and size selected to reduce nonspecific 
amplification products from host DNA (225-275 bp). 

High-throughput 76S rRNA screening. We first applied the QIIME pipeline 
version 1.6 (ref. 27) to the 16S rRNA data set that detected in the testis sample 
G23656 (male SMS5.1 in Supplementary Fig. 1) a total of 19.9% of reads assigned 
to Wolbachia. To specifically investigate and validate the prediction about the 
presence of Wolbachia in the sample, we implemented an additional pipeline. We 
first integrated all 16S rRNA sequences assigned to Wolbachia included in the 
SILVA^^ and NCBI repositories; we manually inspected those sequences with a 
>3% nucleotide divergence from any other Wolbachia 16S, which let us exclude 
five sequences that were mislabeled with close endosymbionts such as Bartonella, 
Rickettsia and Francisella. We obtained a set of 2,064 Wolbachia 16S sequences 
from which the 253-nt-long V4 region was extracted and clustered at 99% identity, 
generating 115 operational taxonomic units (OTUs) covering the diversity in the 
Wolbachia genus. The all-versus-all mapping of the 115 OTU sequence 
representatives provided a lower-bound estimate of the genus' total diversity 
(average identity of 95.9%, s.d. 1.75%, median 96.0%). 

We then performed the mapping of the 24,800 reads of sample G23656 against 
the full SILVA database, retaining only those sequences with full-length percentage 
identity of at least 95% with a Wolbachia 16S. The resulting 5,412 reads have a best 
hit other than Wolbachia at < 89% identity, confirming that a fifth of the sample 
(21.82%) consists of V4 fragments from Wolbachia 16S rRNA. The percentage 
identity was in the 95.3-97.6% interval, which is fully consistent with the observed 
diversity in the V4 hypervariable region of the Wolbachia 16S rRNA genes (average 
95.9%, s.d. 1.75% as reported above). 

The V4 sequences from sample G23656 were then clustered into OTUs at 99% 
identity, discarding those OTUs with < 50 sequences. The resulting OTUs were 
merged with the Wolbachia OTUs from the SILVA and NCBI repository 
(generated as described above) and aligned with MUSCLE version 3.8.31 (ref. 29). 
A V4 sequence representative of the Rickettsia 16S was added as outgroup. A 
phylogenetic tree was then built using RAxML version 7.4.2 (ref 30) with the 
GTRGAMMA model, bootstrapping (1,000 replicates), best maximum likelihood 
tree inference, and displayed with GraPhlAn (https://bitbucket.org/nsegata/ 
graphlan) representing the cardinality of the OTUs as circular barplots. 

Wb/bac/i/a-specific PCR detection and sequencing. DNA from testes and 
ovaries was rehydrated from the 96-well Qiasafe plate (Qiagen) using 30 [A of 
water. A total of 2 |j.l of DNA was used for Wolbachia PCR detection using primers 
specific for Wolbachia 16S rDNA (W-Specf 5'-CATACCTATTCGAAGGGATA 
G-3', W-Specr 5'-AGCTTCGAGTGAAACCAATTC-3') following standard pro- 
cedures^^. Positive samples showed a 438-bp band that was purified with QIAquick 
Gel Extraction kit (Qiagen) and sequenced (Eurofins MWG Operon, Ebersberg, 
Germany). Sample DNA quality was assessed with PCR using primers for the RpS7 
(AGAP010592) An. gambiae gene (FWD 5'-GGCGATCATCATCTACGTGC-3'; 
and REV 5'-GTAGCTGCTGCAAACTTCGG-30. Similarly, a total of 2 |il of DNA 
was used for PCR detection of wsp and fbpA genes following standard procedures 
(see Supplementary Table 5 for primer sets used). 

WGS sequencing and analysis pipeline. Shotgun metagenomic sequencing was 
performed on two mosquito samples from infected ovaries using the remainder of 
the DNA available after Wolbachia-specifLC PCR detection (1-2 ng). Due to this 
limiting condition, libraries were prepared with 1 ng DNA according to the Nextera 
XT protocol (Version Oct 2012). Briefly, the DNA was fragmented in 5[d of 



Amplicon Tagment Mix and 10|al of Tagment DNA buffer (Illumina, San Diego, 
CA, USA). Tagmentation reactions were completed by incubation for 5 min at 
55 °C followed by neutralization with 5 |j.l of Neutralise Tagment Buffer for 5 min. 
Tagmented DNA was used as the template in a 50-|al limited-cycle PCR (12 cycles) 
and processed as described in the Nextera XT protocol. Amplified DNA was 
purified with AMPure XP beads and then normalized to 2nM. Sequencing was 
performed on a HiSeq2000 (Illumina, San Diego, CA, USA) employing one full 
lane per library with 101 bp paired-end reads. 

Raw WGS results consisting of >400M 101-nt-long paired-end reads 
(Supplementary Table 3) were subject to quality control and sliding window 
trimming with a minimum resulting read length of 80 bp, and to sequencing 
artefact removal using PRINSEQ version 0.20.3 (ref. 31). As expected. An. gambiae 
DNA was quantitatively dominant in the read pool, and was removed by 
BowTie2 mapping^^ using the 'very- sensitive' preset option against the An. 
gambiae PEST reference genome (http://www.vectorbase.org/). Supplementary 
Table S2 reports the number of reads that were retained after each pre- 
processing step. 

The resulting read set was mapped against the seven available Wolbachia 
genomes and the high-quality draft assembly of the Wolbachia strain wAlbB 
(Supplementary Table 4). To quantify the expected sequence divergence between 
Wolbachia strains and supergroups, we performed all-versus-all sequence mapping 
(with BLASTN) with all open reading frames (ORFs), considering as pairwise 
common ORFs those sequences with > 80% identity over 50% of the ORE length 
(Supplementary Fig. 2). Reads were uniquely attributable to Wolbachia on the basis 
of the concept of unique marker sequences^^, performed in four steps. First step: 
BowTie2 mapping against the eight Wolbachia reference genomes was performed 
to identify Wolbachia candidate reads. The mapping was performed with enhanced 
sensitivity (score-min L,- 1.0,- 1.0 -D 25 -R 5 -N 1 -L 12 -i S,2,0.25) to capture 
Wolbachia sequence divergence and host specificity as assessed by ORFs' sequence 
comparison among available Wolbachia genomes (Supplementary Fig. 3). 
The matches were also confirmed by BLASTN with word size of length 7. 
Second step: candidate reads coming from the small and large ribosomal units {16S 
rRNA and 23S rRNA genes) were screened out by sequence mapping against the 
comprehensive ribosomal sequences in the SILVA database release 111. Third step: 
the remaining ribosomal-free candidate reads were mapped against the full RefSeq 
genomic database version 60 to identify any non-Wolbachia-specific hits using 
BLASTN with word size of length 7. Fourth step: on the basis of the mapping 
results of steps 1 and 3, the final set of reads from the Wolbachia strain identified in 
An. gambiae (wAnga) was compiled selecting all reads showing > 80% identity 
over > 95 nt to at least one Wolbachia strain, and no hits longer than 80 nt to other 
organisms. Reads hitting non-Wolbachia genomes with identities below 80% and at 
least one Wolbachia genome at >90% were also retained. As control, the same 
procedure was also applied to the two genera closest to Wolbachia according to the 
PhyloPhlAn tree of life-^-^, namely, Anaplasma (six reference genomes) and 
Ehrlichia (five reference genomes). No uniquely attributable reads were found in 
this analysis. 

wAnga reads mapping to all seven Wolbachia genomes were then retained for 
phylogenetic analysis. The homologous sequences were extracted and aligned to 
each wAnga reads with MUSCLE version 3.8.31 (ref 29) and the alignments edited 
to remove leading and ending gaps. Sequence- specific phylogenetic trees were built 
using RAxML version 7.4.2 (ref. 30) with the GTRGAMMA model, bootstrapping 
(1,000 replicates) and best maximum likelihood tree inference. Sequence- specific 
phylogenetic distances were computed inferring the patristic distances within each 
tree, and reported with box plots in Fig. 3b and Supplementary Fig. 3B. A final 
phylogenetic tree was also built using RAxML (GTRGAMMA model, 1,000 
bootstrapping replicates) on the concatenated alignments (Fig. 3d, Supplementary 
Fig. 3D). 
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